Experiments on authorship attribution by intertextual distance in English

نویسنده

  • Dominique Labbé
چکیده

How can it be said that texts are "near" or "distant" from one another? Are different texts by a single author more similar than texts by different authors? To answer these questions, a method is proposed by combination of the calculus of intertextual distance with automatic clustering and tree-classification. A blind test and some additional experiments show that this method offers an interesting tool for non-traditional authorship attribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tool for Literary Studies: Intertextual Distance and Tree Classification

How to measure proximities and oppositions in large text corpora? Intertextual distance provides a simple and interesting solution. Its properties make it a good tool for text classification, and especially for tree-analysis which is fully presented and discussed here. In order to measure the quality of this classification, two indices are proposed. The method presented provides an accurate too...

متن کامل

About labbe's "intertextual distance"

In the 2001, Volume 8, Number 3, issue of the Journal of Quantitative Linguistics (pp. 213 – 231) M. M. Dominique and Cyril Labbé published a paper entitled “Inter-Textual Distance and Authorship Attribution. Corneille and Molière”. Dominique and Cyril Labbé (hereafter referred to as DCL) propose a new formula for the computation of dissimilarity between texts, as well as a distances scale. The...

متن کامل

Authorship Attribution: A Comparative Study of Three Text Corpora and Three Languages

The first objective of this paper is carry out three experiments intended to evaluate authorship attribution methods based on three test-collections available in three different languages (English, French, and German). In the first we represent and categorize 52 text excerpts written by nine authors and taken from 19th century English novels. In the second we work with 44 segments from French n...

متن کامل

Who Wrote this Novel? Authorship Attribution across Three Languages

Based on different writing style definitions, various authorship attribution schemes have been proposed to identify the real author of a given text or text excerpt. In this article we analyze the relative performance of word types or lemmas assigned to represent styles and texts. As a second objective we compare two authorship attribution approaches, one based on principal component analysis (P...

متن کامل

N-gram-based Author Profiles for Authorship Attribution

We present a novel method for computer-assisted authorship attribution based on characterlevel n-gram author profiles, which is motivated by an almost-forgotten, pioneering method in 1976. The existing approaches to automated authorship attribution implicitly build author profiles as vectors of feature weights, as language models, or similar. Our approach is based on byte-level n-grams, it is l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Quantitative Linguistics

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2007